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Finding and Counting MSTD sets 

Geoffrey Iyer, Oleg Lazarev, Steven J. Miller and Liyang Zhang 



Abstract We review the basic theory of More Sums Than Differences (MSTD) sets, 
specifically their existence, simple constructions of infinite families, the proof that 
a positive percentage of sets under the uniform binomial model are MSTD but not 
if the probability that each element is chosen tends to zero, and 'explicit' construc- 
tions of large families of MSTD sets. We conclude with some new constructions 
(~| , and results of generalized MSTD sets, including among other items results on a 

positive percentage of sets having a given linear combination greater than another 
linear combination, and a proof that a positive percentage of sets are ^-generational 
sum-dominant (meaning A,A+A, ..., kA~A-\ \-A are each sum-dominant). 
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1 Introduction 

Many of the most important questions in additive number theory can be cast as 
questions about sums or differences of sets, where the sumset of A and B is 



H . and the difference set is 



A+B = {a + b:AeA,beB} (1) 



A-B = {a-b:aeA,beB}. (2) 
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To see this, let ^ be the set of primes and ^ (respectively c/(^') be the set of ^* 
powers of integers (respectively non-negative integers). 

1 . The famous Goldbach problem is to prove that every even number may be written 
as the sum of two primes; we may interpret this as saying that the even numbers 
are contained in ^ + >^. While this is still open, we do know that all sufficiently 
large odd numbers are the sum of three primes. While sufficiently large means 
greater than lo"""' here, we may remove 'sufficiently large' if we assume the 
Generalized Riemann Hypothesis IIDETZ97I . 

2. Another example is Waring's problem, which says for each integer k there is an 
integer s such that every positive integer is a sum of at most s perfect A:* powers. 

In other words, there is an s (depending on k) such that ,yyji -\ + ^ (where 

there are s sums) contains all positive integers. While the optimal s for a given 
k is not known, it is known that for each k there does exist a finite s (see for 
instance IINa96l ). 

3. Fermat's Last Theorem (proved in ||Wi95l|TW95l ) states that if « > 3 and x,y,z 
are integers, then the only solutions to x" + y" = z" have xyz — 0. After some 
simple algebra we see it suffices to consider the case when x,y and z are all 
positive, and Fermat's Last Theorem is just the statement that (^' + o/f^/) n ^/ 
is empty for « > 3. 

The three examples above all involve determining what elements are in sums of 
sets; it is also interesting to see how often a given element is represented in a sum. 
For example, the Twin Prime Conjecture is the assertion that there are infinitely 
many primes differing by 2; this is equivalent to how often 2 is obtained in ^v — ^.x, 
where 3^x is the truncated set of primes at most x. 

As the topic of sum sets and difference sets is so vast, in this survey article we 
restrict ourselves to an interesting class of questions where there has been significant 
progress in recently years. Given a finite set of integers A, we may look at A +A and 
A— A. The most natural question to ask is: As we vary A over a family of sets, how 
often is the cardinality of A +A larger than A—A7 Denoting the size of a set S by 
\S\, for \A\ large we expect a typical A to have \A +A\ < |A — A|. This is because 
while the diagonal pairs {a, a) contribute a new sum to A +A for each a but only 
one difference (namely 0) to A —A, addition is commutative while subtraction is 
not. This means that for the larger collection of pairs of distinct elements («,«') 
we have a + a' = a' +a but a — a' y^ a' — a. We see a typical pair contributes two 
differences to A —A but only one sum to A +A. Using such logic, one expects sets 
with |A+A| > |A-A| to be rare. 

If |A +A| > |A — A|, we say A is a sum-dominated set or a More Sums Than 
Differences (MSTD) set, while if |A +A| = |A — A| we say A is balanced, and if 
|A +A| < |A — A| then A is difference-dominated. The purpose of this article is to 
describe results in the following areas. 

1. Non-probabilistic constructions of MSTD sets. In this section we summarize 
some of the early constructions of MSTD sets, paying special attention to the 
limitation of these techniques in determining whether or not a typical set is sum- 
dominated. 
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2. A positive percentage of sets are MSTD sets. Here we discuss the papers of Mar- 
tin and O' Bryant IIMO06II and Zhao IIZh2L which show that a very small, but 
positive, percentage of all sets are sum-dominated. 

3. When a 'typical' subset is difference-dominated. If we choose our subsets of 
{0, ...,«— 1} from the uniform model, so that each of the 2" possible subsets 
is equally likely to be chosen, then the previous section shows a positive per- 
centage of subsets are sum-dominated. The situation is drastically different if 
we sample differently. We describe the results of Hegarty and Miller IIHM09II . 
who showed that if each element from {0, ...,«— 1} is chosen with probability 
p{n) and lim„^oo /:>(«) = 0, then in the limit almost all subsets are difference- 
dominated. 

4. Explicit constructions of large families of MSTD sets. The methods of IIMO06I 
IZh2ll are probabilistic, and do not yield explicit families of MSTD sets. Miller, 
Orosz and Scheinerman IIMOS09II gave an explicit construction of a large family 
of subsets of {0, . . . ,n — 1} that are MSTD sets, specifically one whose cardinal- 
ity is at least C/n'^ for some C > 0; later Zhao IZhll gave a different construction 
yielding C' /n with C' > 0. We describe these constructions and generalizations; 
for example. Miller, Pegado and Robinson IIMPR12I show that the density of sets 
Ac{0,...,n-1} with |A+A+A+A| > |A+A-A-A| is at least C'7n^ where 
r=ilog2(256/255)<.001. 

5. Generalized MSTD Sets. A set A is aA:-generationalsum-dominantsetif A,A+A, 

..., kA = A^ +A are each sum-dominant. Iyer, Lazarev, Miller and Zhang 

IIILMZllI proved that a positive percentage of sets are ^-generational for any 
positive k, but no set is ^-generational for all k. Their construction uses a result 
of interest in its own right, namely that if we are given any legitimate order of 
linear combinations of sums and differences of A of the same lengtl^J, a positive 
percentage of A have the cardinalities of these combinations in the desired order- 
ing. Such a result was expected from the work of Miller, Orosz and Scheinerman 
I1MOS09I , who showed if there exists one set satisfying the ordering then there 
exists a large, explicitly constructible family of sets satisfying the condition. In 
BILMZIll the needed set for the induction is found, and instead of appealing to 
results from I1MOS09I , the authors modify the arguments of IIMO06I in order to 
obtain a positive percentage. 

The above list of topics is not meant to be definitive or exhaustive, but rather to 
highlight some of the many results in the field. There are numerous generalizations 
to other linear combinations of sets, as well as related problems in Abelian groups, 
that can be handled with these methods. We strongly urge the reader to consult the 
references for full details and statements of related, open questions. 



Miller thanks Mel Nathanson who, through books and conversations, helped in- 
troduce him to this exciting subject, his collaborators Peter Hegarty, Brooke Orosz, 



' Note that A+A+A—A = — {A—A—A—A); thus we might as well assume any linear combination 
has at least as many sums of A as differences of A. 
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2 Non-probabilistic Constructions of MSTD sets. 

In ||Na06l . Nathanson wrote "Even though there exist sets A that have more sums 
than differences, such sets should be rare, and it must be true with the right way of 
counting that the vast majority of sets satisfies |A — A| > \A +A\." Support for this 
view can be found in the length of the search required to find the first MSTD set. 
Conway is said to have found {0,2,3,4,7,11,12,14} in the 1960s, while Marica 
IIMa69ll in 1969 gave {0,1,2,4,7,8,12,14,15} and Freiman and Pigarev JFPTSl 
found {0,1,2,4, 5,9,12,13, 14,16,1 7, 21,24,25,26,28,29} in 1973. See also the 
papers by Ruzsa IIRu76llRu84l|Ru92|| . 

How hard is it to find such sets? A simple calculation shows that if B = aA + 
/3, then |A+A| = |B+B| and |A— A| = \B - B\; thus we might as well assume 
is in our subset. The number of subsets of {0, . . . , 14} that include is 2'^* = 
16, 384. This is easily searchable by computer, though a little out of the range of even 
the most patient of mathematicians; the only MSTD set found is the one already 
mentioned. Even Freiman and Pigarev's example can be found by a brute force 
within a reasonable time, as 2^^ — 536,870,912. 

While there are many constructions of MSTD sets, most of these constructions 
give a vanishingly small percentage of sets to be sum-dominated. Specifically, while 
there are 2"+' subsets of {0, 1 , . . . , n}, these methods often give only on the order of 
2"/2 (or worse) subsets that are MSTD. 

For example, one way to generate an infinite family of MSTD sets from one 
known MSTD set is through the base expansion method. Let A be an MSTD set, and 
letAit;,,, = {L/=ifl('w' ' '■ o-i S A}. If mis sufficiently large, then |A|(-;,„±A,t;m| = |A± 
A\'^. We thus obtain an infinite family of MSTD sets, and, so long as |A+A| > 1, we 
can have arbitrarily many more sums than differences. Unfortunately, as m is large, 
the percentage of subsets created that are sum-dominated is exponentially small. We 
thus discuss other constructions (though this method will play an important role in 
proving many of the theorems in ij6]l. 

It is very easy to create balanced sets, and many constructions of MSTD sets 
take advantage of this. First, note that if A is an arithmetic progression then A is 
balanced. To see this, letting A = {0, 1, . . . ,n} we find A+A = {0, 1, . . . ,2«} and 
A— A = {—«,...,«} so |A+A| = |A— A| =2n + l. Another way to create a balanced 
set is to take a set symmetric with respect to a number (which need not be in the 
set); this means that there is a number a* such that A = a* —A (this implies A +A = 
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a* +A—A, so |A +A| = |A — A|). Note arithmetic progressions are a special case, 
with a* = n/2. Nathanson ||Na07| gives constructions of MSTD sets using this idea. 
He creates infinite families by adjoining one number to a symmetric set which is a 
small permutation of a generalized arithmetic progression. Numerous examples and 
explicit constructions are given in IINa07l ; we state the first. 

Theorem 2.1 (Nathanson IINa07| ) Let m, d, and k be integers with m>4, 1 <d < 
m— I, d ^ m/2, and k >3 ifd < m/2 and k>4ifd> m/2. Let B = {0,l,...,m — 
l}\{<i}, L = {m-d,2m-d,...,km~d}, a* = {k+l)m-2d, and A* =BULU 
{a* - B). Then A == A* U {m} is an MSTD set. 

How large of a family is this? We have three parameters at our disposal: »z, d and 
k. Note A C {0, . . . , (A: + 1)ot — 2d}. Given some n, look at all triples {m,d,k) such 
that {k+ l)m — 2d<n; this will be an upper bound for the number of MSTD sets 
generated by the theorem that live in {0, 1 , . . . , n} (it will be the actual number if we 
show all the sets are distinct). As we also need k to be at least three, we obtain an 
upper bound by counting all pairs {k, m) with km < n (which is trivially at most n") 
and noting that we have m<n choices of t/ for each pair. Thus this method generates 
at most n^ subsets of {0, !,...,«} being MSTD sets, which is a vanishingly small 
fraction in the limit. The paucity of this family is due to how explicit the construction 
is - everything is completely deterministic and at each stage there is only one option. 

We conclude our discussion on constructions of MSTD sets and families of 
MSTD sets with a result of Hegarty ||He07| . He proved 

Theorem 2.2 (Hegarty IIHe07l ) There are no MSTD subsets of the integers of size 
seven. Up to linear transformations the only set of size 8 is {0,2,3,4,7, 11, 12, 14}. 

We paraphrase (slightly) from IIHe07ll the description of the proof. Let A = {a„ = 
0,fl„_i, . ..,ai}, and represent the «— 1 differences ai~ai+\ as e , (the /* standard 
basis vector in E"^'). If we leave the a,'s undetermined, then |A+A| = n{n+ l)/2 
and |A — A| = «(« — 1) + 1. As |A — A| is larger (in the case where the a,'s are unde- 
termined), in order for A to be an MSTD set we must have non-trivial coincidence 
of differences, specifically a,- — aj — ai^ — a^ for some (/, j) ^ {k, i). Given such an 
equation we can, by projection onto the orthogonal complement of M."^ ' of the sub- 
space ( e ,• — e j) — { e li— e () spans, represent elements of A by vectors in K"^^ 
We recompute | A + A | and |A — A | . If | A + A | < |A — A | we pick another non-trivial 
identification of elements in A —A and repeat the above method with elements of A 
now represented as vectors in R"^^. The computation ends with all MSTD sets of 
size n whose smallest element is 0. With some additional insights that improve the 
run-time, the program can check n = 8 fairly quickly; unfortunately « = 9 is still 
open (though Hegarty has results for all MSTD sets of size 9 having an additional 
property). 
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3 A positive percentage of sets are MSTD sets 

As for each n studied very few of the 2" subsets of {0, 1 ,...,« — 1 } were found to 
be sum-dominant, it was reasonable to conjecture that in the Umit almost no subsets 
were sum-dominant. While this conjecture is false, the percentage of sum-dominant 
sets is so small that this error is understandable. 

Theorem 3.1 (Martin - O'Bryant IIMO06I ) As n ^- °°, a positive percentage of 
subsets o/{0, . . . ,n — 1} are sum-dominant. 

Martin - O'Bryant PMO061 proved this probability is at least 2 • 10^^, which was 
improved by Zhao [iZh21 to at least 4 • 10^"*; Monte Carlo experiments suggest the 
true answer is around 4.5 • lO^"*. For small «, it is possible to enumerate all subsets 
of {0, ...,« — !}, which we do in Figure[T] 



P(n) 
0,0005 h 



Fig. 1 The percentage of sum-dominated subsets of {0 ,n— \} versus logH. These numbers 

were obtained by enumerating all possible subsets for n < 27, and by simulating 10,000,000 subsets 
for each n 6 {30,35,40,45,50,75, 100, 125, 150}. 



Martin and O' Bryant's proof uses probabilistic techniques to estimate the chance 
that elements are in the sumset and the difference set. For definiteness, consider 
subsets S of {0, 1 , . . . , n — 1 }. The sumset S + S lies in {0, 1 , . . . , 2n — 2} and the dif- 
ference set 5— 5 in {— («—!),...,«— 1}. Thenumberof representations of atypical 
k^ {0, 1,...,2« — 2} asa sum of two elements of 5 is roughly «/4— |« — fc|/4, while 
the number of representations of a typical k E { — (« — !),...,«— 1} as a difference 
of two elements of S is roughly n/4 — \k\/4. To see this, first consider the special 
case when S ~ {0, !,...,«}. If we want k = x+y with.*: < y, note oncex is chosen 
then y is determined. If ^ < « — 1 there are essentially k/2 choices for x; the other 
case is handled similarly. Our answer differs from «/4 — |n — k\/4 by a factor of 2. 
This factor is due to the fact that a typical set S has approximately n/2 elements, and 
not n elements (by the Central Limit Theorem, the probability is vanishingly small 
that \S\ differs from n/2 by more than «'"+'^). Figure |2] demonstrates the rapidity 
of convergence. There we uniformly choose many A e {0, . . . , 99} and calculate the 
average number of representations for all the possible sums and differences, and 
compare with the predictions above. Note for the difference plot we have removed 
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the spike at 0, as for each A there are \A \ ways of representing from A— A, and by 
the Central Limit Theorem \A\ is approximately 100/2 or 50. 




Fig. 2 Comparison of predicted and observed number of representations of possible elements of 
the sumset and difference set for A C {0, ... , 99} chosen from the uniform model (so each of the 
l'"" possible subsets are equally likely to be chosen). We chose 100 different such A and calculated 
the average number of representations of each possible sum (left plot, which lives in {0, ... , 198}) 
and difference (right plot, which lives in {—99, . . . , 99}), compared with the theoretical predictions. 
Note the spike at was removed from the difference plot. 



We see from the above that there are many ways to represent the possible sums 
or differences, so long as they are not near the fringe elements. Their proof pro- 
ceeds as follows. Let A be an MSTD set, and write A as a disjoint union LUU, with 
LC {0,...,£-l} and Re {£,...,£ + m - 1}. Consider the sets Am = LUMDU', 
where M C {£,...,£ + m — 1} and U' = U + m (so U' is just U translated by 
m). If k is close to (respectively £ + m + u), then whether or not k S Am +Am 
depends only on L + L (respectively U' + U'). Similarly, the fringe elements of 
Am —Am are determined hy U' — L and L — U'. By cleverly choosing A (they take 
L = {0,2,3,7,8,9, 10} and t/ = {11, 12, 13, 14, 16, 19,20,21}) we can ensure that 
there are more sum fringe elements included than difference fringe elements. The 
proof is completed by showing that a positive percentage of the possible M's lead 
to no missing sums or differences in the remaining intervals. This is accomplished 
through a series of technical lemmas. The estimates here are far from optimal, but 
suffice to prove a positive percentage of subsets are sum-dominant. Specifically, the 
authors frequently appeal to the crude estimate that 

h 

PToh{{a,a + l,...,b}<^A+A) < £ Prob(yt ^ A+A) 

k=a 

(and similarly for difference sets). 

There are many other results in this paper. The authors prove the existence of pos- 
itive lower bounds for the percentage of sum-dominant, balanced, and difference- 
dominated sets. Though they cannot show the limits exist, they conjecture that this 
is the case. They show that the average cardinality of the difference sets is four more 
than the average cardinality of the sumsets, providing additional support that sum- 
dominant sets should be rare. They also explore |A +A| — |A — A|, and show that for 
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any x there is an A such that \A + A| - |A — A| = .« with A C {0, . . . , 17|x|} (which 
is significantly more economical than the base expansion method would give). The 
paper ends with some numerical explorations of missing sums, and conjectures that 
the proportion of subsets A of {0, . . . ,« — 1} with \A +A| ~ j and \A ~A\ ~ k con- 
verges to a limiting proportion p,- ^ as n -^ °o. 

Martin and O' Bryant fixed the fringe (their L and L'^) and varied the middle M; 



Zhao IIZh2l allowed the fringe to vary as well. His methods allow him to obtain 
MSTD sets that are not missing any middle sums, which he shows happens a van- 
ishingly small number of times. This leads to a significant strengthening of the re- 
sults of Martin and O'Bryant, and a proof of many of their (and others) conjectures. 
Specifically, he shows the following limits exist (and provides a deterministic algo- 
rithm to approximate their values): the percentage of sets that are sum-dominant; 
the percentage of sets that are balanced; the percentage of sets that are difference- 
dominant; the percentage of sets that are missing exactly s sums and d differences; 
the percentage of sets that have exactly x more sums than differences. The paper 
ends with an investigation of the probabilities of various elements being in an MSTD 
set, proving a conjecture of Miller, Orosz and Scheinerman IIMOS09I that as n grows 
the probability a 'middle' element is in an MSTD set in {0, ...,«} tends to 1/2. 



4 When a 'typical' subset is difference-dominated 

The proofs that a positive percentage of subsets of {0, . . . , n — 1 } are sum-dominant 
all use, in one way or another, the following fact: if A is uniformly drawn from the 
2" subsets of {0, . . . ,n — 1}, then with high probability A has essentially «/2 ele- 
ments and almost all possible sums and differences are realized. Along these lines, 
Martin and O'Bryant IIMO06I showed that a typical difference set is missing only 
7 of the possible differences, and a typical sumset is missing 1 1 (see IIILMZllll for 
a proof that the moments of the limiting distribution exist and the tail probabilities 
are bounded above and below by exponentially decaying probabilities). These tech- 
niques apply to a slightly more general case. We may reinterpret the uniform model 
above as saying each element fc e {0,. . .,«— 1} isin a subset A with probability 1/2. 
We could instead fix a probability p e (0, 1) and let each A: be in A with probability 

P- 

In this constant probability model, our previous results on a positive percentage 
again hold. If, however, we allow p to vary with «, then the situation is drastically 
different. Hegarty and Miller IHMQ9I consider a binomial model where each k G 
{0, ...,«— 1} is independently chosen to be in a subset A with probability /?(«). If 
p{n) is a constant independent of n, we are in the regime handled by Martin and 
O'Bryant (though we described their method in the uniform model case, similar 
arguments work so long as the probability is independent of «). If, however, p{n) 
tends to zero, then we are no longer in the case where |A|, |A + A| and |A — A| are 
always large. In this case very few sets are sum-dominant, which is in line with 
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Nathanson's (and others) intuition that, if properly counted, sum-dominant sets are 
rare. 

Before stating their main resuh, we first set some notation. Let N denote the 
positive integers. We say f{x) — o{g{x)) if \f{x)/g{x)\ ^- as .*: ^- oo. 

Theorem 4.1 (Hegarty - Miller IIHM09I ) Lef p : N -> (0, 1) be any function such 
that 

n^^ ~ o{p{n)) and p{n) ~ o{l). (3) 

For each n € N let A be a random subset of {0, ...,«— 1} chosen according to a 
binomial distribution with parameter p{n) (so each k £ {0, . . . ,n — 1} w /n A with 
probability p{n)). Then, as n ^ °°, the probability that A is difference-dominated 
tends to one. 

More precisely, let S^ ,& denote respectively the random variables \A-\-A\ and 
\A— A\. Then the following three situations arise: 

(i) p{n)=o{n-^l^):Then 

y _ {n-p{n)) ^^^ Qr^2y ^ {n-p{n)f. (4) 

(ii) p{n) = c ■ n^ ' for some c G (0,°°) ." Define the function g : (0,°°) — > (0,2) by 



Then 



y -- g[ — ]n and ^ - g{c^)n. (6) 



(Hi) n-^l^ ^ o(p(n)) : Let ^'' := (2« + l)-S^, &' := (2« + 1) - ^. Then 

J^' ^2-^' ^ -^. (7) 

p{ny 

The proof proceeds by using various tools to obtain strong concentration results 
on the sizes of the sum and difference sets. The tools needed depend on the decay 
of p{n). Not surprisingly, the faster /?(«) decays the easier it is to obtain the needed 
concentration results. The greater the decay, the fewer elements are in a typical A, 
and thus the greater the effect of the non-commutativity of subtraction in generating 
more new elements. Chebyshev's Theorem suffices for case (i), two still follows 
elementarily (via a second moment argument), while the third case requires some 
recent results on strong concentration by Kim and Vu IKVOOI IVuOOl I Vud2l . 

The idea of the proof, at least in case (i), is fairly straightforward. When «^ ' = 
o{p{n)) and pin) = o(«^''^), then the expected size of a randomly chosen A is 
np{n) = o(n^'^). The heart of the proof is to show that such sets are nearly Sidon 
sets, which means that most pairs of elements generate distinct sums and differ- 
ences from other pairs (other than the diagonal pairs, those where the two elements 
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are equal, which give just one difference, namely zero). As the non-diagonal pairs 
generate one sum but two differences, we expect that the difference set will be twice 
as large as the sumset. A simpler proof of this case is given in the arXiv version of 
IIHM09I . as well as IHMIOI (see Appendix 2). 

We sketch the proof of case (i) as it highlights the ideas without too many tech- 
nicalities. The first step is to bound, with high probability, the size of a subset A of 
{0,. . . ,n— 1} chosen from the binomial model with parameter /9(«) =o{n^^'^). For 
ease of exposition, assume p{n) ~ cn^ for some 5 € (1/2, 1). Using indicator ran- 
dom variables Xq, . . . ,X„_ i to denote whether or not A: g A, by Chebyshev's theorem 

the probability X = Xq H hX„_i is in [\cn^-^ , \cn^-^] is at least 1 - ^A^^"'. 

From here, we obtain upper and lower bounds for the number of pairs of elements 
{m,n) with m <n both in A. All that remains is to show that, with high probability, 
almost all of the pairs generate distinct sums and differences from each other 

For definiteness we study the differences. If {m,n) and {m' ,n') generate the same 
difference then m~ n = m' — «'. Let ¥,„„,„! „i be 1 if m,n,m\n' are in A and m — 
n = m' — n' , and let Y be the sum of the }',„,„,„,'.„''s. What is W^Y]1 Rather than 
determining it exactly, it suffices to obtain an upper bound. One can show E[y] < 
IC'^ir'^'^" where C = max(l,c) by considering separately the cases where all four 
indices are distinct and when three are. As a typical A has size on the order of n'^, 
we expect on the order of 2n^^^ differences; this is significantly larger than E[y], so 
most of the differences are distinct from each other. All that remains is to control the 
variance of Y , and then another application of Chebyshev's theorem proves that Y is 
concentrated near its mean, and hence there are on the order of 2n^^ differences. 
The variance estimate follows from elementary counting. 

A particularly interesting feature of the above theorem is the existence of a 
threshold function for the density. If the density p{n) = o(«^ '' ^) then almost surely 
the ratio of the size of the difference set to the sumset is 2, while above the thresh- 
old (so n^'/^ = o{p{n))) the ratio is 1 (though the number of missing sums is 
twice that of the number of missing differences). If p{n) = cn^^'^ then the ratio 
of |A-A|/|A+A| tends to g{c^)/g{c^ /2), with g{x) =2(6--^- (1 ~x))/x. Note 
this ratio tends to 2 as c — > and tends to 1 as c — > oo, which is in line with Cases 
(i) and (iii) of the theorem. There is thus a nice phase transition in behavior, though 
this is hard to see experimentally as lO^'^n^''^ is smaller than «^''^log^' n until n 
exceeds exp(10'*^). In Figure[3]we numerically explore this transition. 

Not surprisingly, for a fixed n the larger c is, the closer the behavior is to the lim- 
iting case. To investigate this further, in Figure |4] we examine 40 choices of c from 
.01 to .41 with n ~ 1,000,000. For c = .01 the typical random A has only 10 ele- 
ments; this increases to about 400 when c = .41. We see a noticeable improvement 
between the observed and conjectured behavior for this larger value of n. 

To further investigate the transition behavior, we fixed two values of c and stud- 
ied the ratio for various n. We chose c = .01 (where the ratio should converge to 
1.99997) and c = A (where the ratio should converge to 1.99667); the results are 
displayed in Table [T] 
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Fig. 3 Plotof |A-/1|/|A+A| for ten A chosen uniformly from {l,...,n} («= 10,000 on the left 
and 100,000 on the right) with probability p(n) = t/ y/n versus g(c^)/g{c^/2). 



Fig. 4 Plot of |A — A|/|A+A| for ten A chosen uniformly from {1, . . . ,n} with probability p{n) 
cj \fn (n = 1,000,000) versus g{c?')/g(c^ /2) (second plot is just a zoom in of the first). 



n 


Observed Ratio (r = 


.01) 


Observed Ratio (c = 


.1) 


100,000 


1.123 




1.873 




1,000,000 


1.614 




1.956 




10,000,000 


1.871 




1.984 




100,000,000 


1.960 




1.993 





Table 1 Observed ratios of |A— A|/|A+A| forA chosen with the binomial model p(n) = cn^^" for 
fc e {0, . . . ,n — 1} for c = .01 and .1; as « — > £» the ratios should respectively converge to 1.99997 
and 1.99667. Each observed data point is the average from 10 randomly chosen A's, except the last 
one for c = .1 which was for just one randomly chosen A. 



5 Explicit constructions of large families of MSTD sets 



Until recently, all explicit constructions of families of MSTD sets led to very sparse 
families, with an exponentially small percentage of the 2" subsets of {0, ...,« — 1} 
being sum-dominant. While the methods of Martin and O' Bryant proved that a posi- 
tive percentage of the 2" subsets were sum-dominant, their probabilistic method did 
not allow them to explicitly list these MSTD sets. We quickly review their construc- 
tion, which was described in greater detail in S|4] 

The word explicit requires some comment. We say a construction is explicit if 
there is a very simple rule that can quickly be implemented to generate the sets. 
For example, one method involves taking any set M G {0, . . . , m — 1 } such that there 
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are never k consecutive elements in{0,...,m— 1} not in M. It is very easy to write 
down sets having this property; it is also easy to count how many such sets there are 
(and it is this ease in counting that leads to many good results). 

Martin and O' Bryant began by choosing a special set A ~ LUU with L C 
{0, ...,£— 1} and U C {i,...,i + u — 1} such that more of the fringe sums were 
realized in A + A than fringe differences. They then showed that one could insert 
almost any set in the middle of A (shifting the elements of U up) and have a sum- 
dominant set. Miller, Orosz and Scheinerman IIMOS09I explored which sets, when 
inserted, did not lead to sum-dominant sets. While this is a very hard question, it 
turns out that if one carefully chooses sets L and U then one can show any set that is 
never locally too sparse may be inserted and yield a sum-dominant set. The end re- 
sult is a sparser family than Martin and O' Bryant; however, it is still a large family, 
and all the technical probability lemmas of IIMO06I are replaced with elementary 
counting arguments. 

The following property is crucial in the argument. We say a set of integers A has 
the property P„ (or is a P„-set) if both its sumset and its difference set contain all but 
the first and last « possible elements (and of course it may or may not contain some 
of these fringe elements). Explicitly, let a = minA and b = maxA. Then A is a P„-set 
if 

{2a + «, ..., 2^-«} C A+A (8) 

and 

{-{b-a)+n, ..., {b-a)-n} C A-A. (9) 

It is not hard to show that for fixed a G (0, 1 /2) a random set drawn from {0, . . . , n — 
1} in the uniform model is a Pyan\ "Set with probability approaching 1 as « ^- °°; it 
is even easier in our situation as the length of the set A will grow but n will remain 
fixed. Their main result is 

Theorem 5.1 (Miller-Orosz-Scheinerman IIMOS09D Let A ^LURbeaP„, MSTD 
set where L C {0, . . . ,n — 1}, 7? C {n,2n — 1}, and 0,2n — 1 G A,o/or example, 
A = {0, 1,2,4,7,8, 12, 14, 15} from l[Ma69]l works. Fix a k > n and let m be arbi- 
trary. Let M be any subset of{n + k,...,n-\-k + m—l} with the property that it does 
not have a run of more than k missing elements (i.e., for all £ € {n + k, . . . ,n + m} 
there isaj€{i— l,...,i + k — 2} such that j S M). Assume further that n + k ^ M 
and set A{M;k) = LUOi UMU02U7?', where 0\ = {n,...,n + k- 1}, O2 = 
{n -\- k -\- m, . . . ,n -\- 2k -\- m — 1} (thus the Oj 's are just sets of k consecutive inte- 
gers), and R' = R + 2k-\- m. Then 

7. A(M;k) is an MSTD set, and thus we obtain an infinite family of distinct MSTD 

sets as M varies; 
2. there is a constant C > such that as r —^ °° the proportion of subsets of 

{0, . . . , r — 1 } that are in this family (and thus are MSTD sets) is at least C/r . 



Requiring 0, 2n — 1 e A is quite mild; we do this so that we know the first and last elements of A. 
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It turns out that being a P„-set is not an especially harsh condition, and it is pos- 
sible to find these sets. The idea of the construction is to add sets in the middle 
such that all possible middle sums and differences are obtained, and thus whether 
or not A(M,^) is sum-dominant will depend only on A. Specifically, it will depend 
on whether or not A itself is an MSTD set. While the choices in the construction 
are not optimal, they do suffice to almost give a positive percentage of sets are sum- 
dominant, where now we miss by a power instead of by an exponential. A little 
algebra shows that if A is a /^«-set, then so too is our A(M;^). To see this, we need 
only show that we hit all possible sums and differences except at the fringe. Briefly, 
the idea behind the construction is that because Oi and O2 have k consecutive in- 
tegers and M never misses k consecutive integers, when we look at sums such as 
Oi +M we will always have two elements in A{M;k) that will add to the desired 
number (and similarly for the differences). 

The rest of the proof deals with examining how restrictive the assumption is 
that M never misses k consecutive integers. One can solve this by writing down 
a recurrence relation, but an elementary approach is available which yields quite 
good results with little work. We assume a slightly stronger condition: we break M 
into blocks of length k/2 and assume M always has an element from each of these 
blocks. This ensures that there can never be a gap as large as k between elements of 
M (the gap is at most k — 2). There are 2*^' ^ possibilities for each block of length k/2; 
all but one (choosing no elements) satisfies the stronger condition. The percentage 
of such valid sets in{0,...,r— l}isa constant times 



■/4 

22A: I " 2*^/2 / 



' 1 / 1 \ *72 

L^A^-^2) ■ (10) 



There are two factors leading to obtaining less than a positive percentage. The first 
is, obviously, that in each block of length k/2 we lose one possibility, and this factor 
is raised to a high power. The second is that Oi and O2 are completely determined 
and their length depends on k. Thus, as soon as k grows with «, we see we cannot 
have a positive percentage. Analyzing the sum gives the claimed bounds. 

Remark 5.2 The above theorem can be improved by appealing to an analysis of the 
probability m consecutive tosses of a fair coin has its longest streak of consecutive 
heads of length £ (see l\Sc90]l }. What is fascinating about the answer is that while 
the expected value ofi grows like log2(m/2), the variance converges to a quantity 
independent of m, implying an incredibly tight concentration. If we take 0\ and O2 
as before and of length k, we may take a positive percentage of all M's of length 
m to insert in the middle, so long as k = \og2{m/2) — c for some c. The size of A 
is negligible; the set has length essentially m + 2k. Of the 2™+ possible middles 
to insert, there are C2'" possibilities (we have a positive percentage ofM work, but 
the two O 's are completely forced upon us). This gives a percentage on the order of 
2"Y2'"+ ; as k ~ log2(m/2) — c, this gives on the order ofl/m as a lower bound 
for the percentage of sum-dominated sets, much better than the previous l/m . 
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The results of IIMOS09I can be generalized to compare linear forms. We can find 
infinite families of sets satisfying 

\eiA-\ he„A| > leiAH he„A|, e,-,e; e {-1,1} (11) 

if we can find one set satisfying the above. We've seen from IIMO06IIZh2l that very 
few sets are sum-dominant; thus we expect the percentage of sets satisfying (fTTl i to 
be extremely small, and thus expect it to be a challenge to find the needed set. Brute 
force search found {0, 1, 2, 3, 7, 11, 17, 21, 22, 24, 25, 28, 29, 30, 31, 33, 44, 45, 48, 
49}, which gives \A +A +A\ > \A +A — A|; unfortunately, such naive searching was 
unsuccessful in finding examples for other comparisons. We describe a new method 
by Iyer, Lazarev, Miller and Zhang IiILMZI 1 1 in Ej6] which generates the needed sets 
to begin the induction arguments. 

In the above generalizations, the construction from IIMOS09I with \A +A\ > \A — 
A I is mimicked for the linear forms. In particular, we still assume that M has at least 
one element in each block of length k/2. While this was necessary for \A +A | > |A — 
A\, Miller, Pegado and Robinson IIMPR12I show that this is not needed in general. 
For example, if we are studying |A + A + A + A| versus |A + A — A — A|, we are 
assisted by the fact that we can have (9, + Oj and then add this to M + M. The final 
result of all of this is that we may allow Oi and O2 to be significantly more sparse 
than in IIMOS09II . where they had to choose k consecutive elements and thus had no 
freedom. What matters is that Oi + Oj contain large consecutive blocks of integers, 
not that each Oi do so. This allows us to improve upon the 1/2^*^ terms in ( fTOb . 

Before stating the result, we need to slightly generalize the notion of a P„-set to 
a P^-set We say A is a P,^-set if A+A+A+A and A+A-A-A each contain all 
but the first and last n elements; thus what we called a P„-set before is really a P,^-set. 



Theorem 5.3 (Miller-Pegado-Robinson IIMPR12I ) LetA=LURbea P„, MSTD 
set where L C {0, ...,« — 1}, /? C {n,2n — 1}, and 0,2« — 1 G A,|j/or example, 
A = {0, 1, 3, 4, 7, 26, 29, 30, 32, 33, 34, 27, 28, 31, 53, 56, 57, 59, 60, 61} works. 
Fix a k>n and let m be arbitrary. Let M be any subset of{n + k,...,n + k + m— 1 } 
with the property that it does not have a run of more than k missing elements (i.e., 
for all II ^ {n + k,. . .,n + m} there is a j Cz {£— I,. . .,£ + k — 2} such that j G M). 
Assume further that n + k^M and set A{M;k) = LUOi UMU(92U/?', whereOi = 
{«,...,« + A: — 1 }, O2 = {n + k + m, . . . ,n + 2k + m— 1 } (thus the Oi 's are just sets 
ofk consecutive integers), and R' = R + 2k + m. Then 

1. A{M;k) is an MSTD set, and thus we obtain an infinite family of distinct MSTD 
sets as M varies. 

2. There is a constant C > such that as r ^f °° the proportion of subsets of 
{0, . . . , r — 1 } that are in this family (and thus are MSTD sets) is at least C/r ' . 

3. With better choices of 0\ and O2, one can explicitly construct a large family 
of sets A with |A +A +A +A| > |(A +A) - (A +A)| and show that the density 



3 



As before, requiring 0, 2« — 1 6 A is quite mild and is done so that we know the first and last 



elements of A. 
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of sets A C {0, ...,« — !} satisfying this condition is at least C jn^ , where r — 
ilog2(256/255)<.001. 
4. For each integer k, there is a set A C {0, . . . , 157A:} such that \2A +2A| — \2A — 
2A\ = k; ifk is large we may take A C {0, . . . , 35 |A;| }. 

The proof of the first two assertions follows identically as in IIMOS09I (if we 
argue as in Remark |52] and use the results from IISc90l . we may improve (2) from 
r^l^ to r^"). For the third assertion, the additional binary operations gives us enor- 
mous savings and removes many of the restrictions on the form of the (9,'s. We note 
that the 0,'s show up in sums and differences at least in pairs, unless matched with 
L + L + L, R' +R' + R' or L + L-R' (A = LUR). Each of L + L + L, R' +R' +R' 
and L + L — R' contains a run of 16 elements in a row for our set A. This allows us 
to relax the restrictions on (9, from IIMOS09I (each O, was k consecutive elements); 
if each O, has no run of 16 missing elements and 2(9, is full for both (9,'s, simple 
algebra shows that we get all sums and differences as before. This looser structure 
on the (9,'s allows us to replace the 1/2^*^ in (fTOl i with a much better term, leading 
to a significantly better exponent and thus greatly improve the density bound. 

Returning to MSTD sets (and not their generalizations), the current record for 
densest explicit family of MSTD sets is due to Zhao IZhll . who found a family of 
{0, ...,«— 1} of order 2"/«. He achieved this by showing a correspondence between 
bidirectional ballot sequences and sum-dominant sets. A ballot sequence is a list of 
Is and Os if every prefix has more Is than Os and the maximum excess of Is over 
Os is attained at the end of the sequence. If you imagine the Is as winning $1 and 
the Os as losing $1, we may interpret this as we bet a fixed amount each game, our 
winnings are always positive and our greatest balance is at the end. A sequence of 
Is and Os is a bidirectional ballot sequence if both it and the reversed sequence are 
ballot sequences. 

Much of the construction is similar to IIMO06I IMOS09II : we again take a set that 
leads to the desired fringe behavior, and study which sets M may be inserted. Unlike 
the previous constructions, here we ask that M is a bidirectional ballot sequence 
(where we write 1 if an element is in M and if it is not). This is equivalent to the 
following. Let M C {0, ...,m— 1}. Then every prefix and suffix of {0, ...,m— 1} 
has more than half its elements in M. As each prefix and suffix has more than half its 
elements in M, by the pidgeon hole principle at least one pair will be in M, and that 
will generate the desired sum or difference. The problem is thus reduced to counting 
the number of bidirectional ballot sequences. 



6 Generalized MSTD Sets 

There are many ways to generalize the notion of a sum-dominant set. Below we 
discuss two possibilities that were recently analyzed in IIILMZllI : we comment 
briefly on the ideas and constructions, and refer the reader to the article for full 
details. As we are always adding sets and never multiplying, in all arguments below 
we use the shorthand notation 
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kA = A + ---+A. (12) 

k times 

1 . Given non-negative integers si,di,S2,d2 with si +di = S2+d2>2, can we find 
a set A with \siA — diA\ > \s2A — d2A\, and if so, does this occur a positive per- 
centage of the time? 

2. We say a set is ^-generational if A, A + A, ..., kA are all sum-dominant. Do k- 
generational sets exist, and if so, do they occur a positive percentage of the time? 
Is there a set that is A:-generational for all kl 

The first question is motivated by generalizing the binary comparison. When 
si+di = 2, the only possible sets are A+A and A— A (note —A — A is the same as 
the negation of A +A). When si +di =3, again there are again essentially just two 
possibilities, A + A + A andA+A— A (asA— A-A ^—(A+A—A), and thus without 
loss of generality we might as well assume i, > t/,). The situation is markedly differ- 
ent once the sum is at least 4. In that case, we now have A +A +A +A,A+A +A —A 
and A+A~A—A. All possible orderings happen a positive percentage of the time. 

Theorem 6.1 (lyer-Lazarev-Miller-Zhang IIILMZllI ) Given non-negative inte- 
gers si,di,S2,d2 with si +di = S2 + d2 ~ k > 2, if {si^di} ^ {s2,d2} then a pos- 
itive percentage of all sets A satisfy \siA~ d\A\ > \s2A~d2A\. For definiteness 
assume si is the largest of the s's and d's. Given any non-negative integers i,j 
with j < 2i, for all n sufficiently large there exists an A C {0, 1, . . . ,n} such that 
\s\A — d\A\ — kn + 1 - / and \s2A — d2A\ —kn + l- j. 

Sketch of the proof. The proof is similar in spirit to many of the results in the 
field; we first find one example by cleverly constructing a set with a certain fringe 
structure, and then use the methods from Martin-O' Bryant IIMO06I to expand the 
set by essentially adding anything in the middle. The difficulty, as was apparent in 
IIMOS09I . is in constructing one such set. To make such a set A, we pick fringes 
L and R such that their sums (with themselves or with each other) have the same 
structure (a few chosen elements below the maximum missing). Then we let A = 
LUMU{n — R), where M is a large interval in the middle. If M is large enough, we 
don't have to worry about anything besides the fringes. As A is summed, the fringes 
slowly fill in, however, we choose L such that max(L) < max{R). This means that 
the right fringe of kA fills in faster than the left. Note that the right fringe of kA is 
just k{n — R), and the right fringe of S2A — d2A is S2{n — R) — d2L. Since R grows 
faster than L, we can choose the middle such that k{n — R) will intersect with the 
middle and be filled in, but S2 (n — R) — d2L will not. At the same time, we have that 
the left fringe of kA is missing one element, and the left fringe of S2A — d2A is as 
well. We refer the reader to IIILMZllI for details of the construction for a given / 
and ;. 

To illustrate the method, consider 
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L = {0,l,3,4,...,A:-l,A:,;t+l,2;t+l} 

= [0,f]\({2}U[£-fc+l,£-l]) 

R = {0,1,2, 4,5,...,/t,/t+l,yt + 2,2A: + 2} 

= [0,r]\({3}U[fc + 3,2A:+l]). (13) 

For any x,3' G N, the basic structure of xL + yR is the same as that of the original 
set. Basically, xL + yR is always missing the first k elements below the maximum, 
as well as the singleton element 2fc — 1 away from the maximum. Even more, it is 
missing no other elements. 

Returning to the original problem, our initial set has a fringe structure and suf- 
ficient empty space to allow the fringe to grow and exhibit the desired behavior, 
followed by a full middle. We can have more control of the set's behavior by putting 
in another fringe along the outside, with sufficient empty space to let the fringe 
exhibit the correct behavior before it intersects with the inner fringe. This process 
becomes technical, but it allows for a great degree of control over sets. 

More generally, one has 

Theorem 6.2 (lyer-Lazarev-Miller-Zhang IIILMZllI ) Given finite sequences of 

length k called Xj , yj , Wj , Zj such that Xj + yj = Wj + Zj = j, Xj ^ Wj and Xj ^ Zj, for 
every 2 < j < k, there exists a set A such that \xjA — yjA > WjA — ZjA for every 
2< j <k. In particular, there exists a set A such that \cA + cA\ > \cA — cA\forevery 
l<c<k. 

The above theorem answers our second question, and is the best possible (at 
least in regard to ^-generational sets) as every set is finite generational. In other 
words, one cannot have a set A such that |cA + cA| > |cA — cA| for all c. It turns 
out that all sets have a kind of limiting behavior As we continue adding A to its 
sums, eventually we have a full middle, and any interesting behavior will occur 
on the fringes. Note that if we normalize A to include 0, we have cA C cA~ cA. 
Essentially, the difference sets eventually have each fringe element as the sum sets. 
When c is sufficiently large, the fringes of cA stabilize, which gives |cA — cA| > 
|cA + cA|. Now, taking differences allows the left fringe to interact with the right 
fringe, while taking only sums keeps these separate. This means that it is possible 
(and in fact likely) to have |cA — cA| > |cA +cA| for all sufficiently large c. We can 
readily obtain an upper bound on how long we must wait for the limiting behavior 
of \kA\ to set in. 

Theorem 6.3 (lyer-Lazarev-Miller-Zhang IIILMZllI ) Let A ^ {fli,fl2, ,«;«} C 
{0, 1, ...,«— 1} fee fl set of integers (ai < a2 < ■ ■ ■ < a,„) and let s ~ gcd{ai, fl2. ■ ■ ■> 
am)- Then there exists an integer N such that for k>N we have \kA\ — "'"^ "^' — C 
where C is a constant and k is bounded above by "'"7"' ■ 

Sketch of the proof: It is enough to show the claim for a set of the form 
{0,ai, . . . ,fl„,} with gcd(fli, . . . ,a„,) — 1. Adding A to itself a\ times will generate 
all congruence classes of ai because of gcd(ai,. ..,«,„) = 1- Adding A to itself a,,, 
times will make both the left (L) and right {R) fringes stabilize, where L = M n {0, 
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1, ..., aia,„} andR = Mn{fea„, — aiGm,--- ,kam}, and also ensures that the middle 
part is completely filled. 



We end with a few examples of the previous theorems. In these theorems no 
effort was made to optimize the arguments and generate minimal such sets; this 
would be an interesting future project, as it is almost surely possible to construct 
examples of sets with the above properties that contain many fewer elements. In 
particular, the base expansion method of combining sets is extremely inefficient. An 
alternative, which is discussed briefly above, is the multiple fringes method. This 
allows for much smaller sets, however, the requirements for the method to work 
are very stringent, and the proofs are messy. Therefore we find it best to give the 
constructions using the base expansion method instead. 

• If we set 

A = {0,1,3,4,5,9,33,34,35,50,54,55,56,58,59,60} (14) 

then 

\A+A+A+A\ > \A+A+A-A\. (15) 

• If we take 

A = {0,1,3,4,7,26,27,29,30,33,37,38,40,41,42,43,46,49,50,52,53,54, 
72,75,76,78,79,80} (16) 

then 

\A+A\ > \A-A\ and \A+A+A+A\ > \A+A-A-A\; (17) 

in other words, A is 2-generational. 

• If we let 

A ={0,1,3,4,5,6,11,50,51,53,54,55,56,61,97,132,137,138,140, 

142, 143, 144, 182, 187, 188, 189, 190, 192, 193, 194} (18) 

then 

\4A-A\ > \5A\ and \4A ~ A\ > \3A - 2A\ . (19) 
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